Skip to content

BUG: Lambda function returns KeyError in DataFrameGroupBy.agg #27921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Aug 30, 2019

Conversation

charlesdong1991
Copy link
Member

@charlesdong1991 charlesdong1991 commented Aug 14, 2019

order : List[Tuple[str, str]]
Pairs of the input and output column names.
order : List[int]
List of reordered index of columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List of columns indices

@jreback jreback added this to the 0.25.1 milestone Aug 15, 2019
@charlesdong1991 charlesdong1991 changed the title BUG: Multiple lambdas for the same column return KeyError in DataFrameGroupBy.agg with named aggregation BUG: Lambda function returns KeyError in DataFrameGroupBy.agg Aug 15, 2019
@TomAugspurger TomAugspurger modified the milestones: 0.25.1, 1.0 Aug 15, 2019
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is growing to be pretty complex. I wonder: is it possible to pass order into maybe_mangle_lamdas? Then when we iterate over elements of func we might update the corresponding name in order?

@charlesdong1991
Copy link
Member Author

emm, not sure, could take a look, but probably will change few functions as well. This looks indeed a bit complex, but I found out there are quite few issues around it:

  1. allow lambda in groupby.agg
  2. correctly order different lambda functions and dataset which involves different lambdas on same column, mixed lambda and aggfunc.
  3. potentially could remove duplication warning, right now, if gb.agg(a=(col, 'min'), b=(col, 'min')) would raise an error, but gb.agg(a=(col, 'min'), b=(col, lambda x: np.min(x))) won't. Users might actually want to do it, so that they could use b column for other ops while based on np.min result. There is also an error test in test_aggregate.py which could be potentially removed.
    But I will take a look tomorrow to see if I can come up with a simpler version. @TomAugspurger

@pep8speaks
Copy link

pep8speaks commented Aug 16, 2019

Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-26 18:03:11 UTC

@charlesdong1991
Copy link
Member Author

Hi, @TomAugspurger I simplified the code quite a lot, but still in _normalize_keyword_aggregation because i feel it is better than changing _maybe_mangle_lambdas. Right now, hopefully it wont look too complex. And I also add more complex test cases to prove the lambda works in different scenarios and order of columns is correct. It looks all good locally so feel free to take a look. ^^

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better, thanks.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few unit tests that directly test _uniquify_aggfunc?

return aggspec, columns, order

# uniquify aggfunc name if duplicated in order list
uniquified_order = _uniquify_aggfunc(order)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify that the output example in the docstring still passes?

Can you also add a docstring example were this new code is hit in https://github.com/pandas-dev/pandas/pull/27921/files#diff-bfee1ba9e7cb79839776fac1a57ed940R1742?

Copy link
Member Author

@charlesdong1991 charlesdong1991 Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, verified. And added tests for _make_unique function, all tests pass locally @TomAugspurger

@charlesdong1991
Copy link
Member Author

wanna give a follow-up review? @TomAugspurger

@TomAugspurger
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple lambdas for the same column return KeyError in DataFrameGroupBy.agg with named aggregation
4 participants